ggClusterNet: An R package for microbiome network analysis and modularity‐based multiple network layouts

Abstract The network analysis has attracted increasing attention and interest from ecological academics, thus it is of great necessity to develop more convenient and powerful tools. For that reason, we have developed an R package, named “ggClusterNet,” to complete and display the network analysis in an easier manner. In that package, ten network layout algorithms are designed to better display the modules of microbiome network (randomClusterG, PolygonClusterG, PolygonRrClusterG, ArtifCluster, randSNEClusterG, PolygonModsquareG, PolyRdmNotdCirG, model_Gephi.2, model_igraph, and model_maptree). For the convenience of the users, many functions related to microbial network analysis, such as corMicor(), net_properties(), node_properties(), ZiPiPlot(), random_Net_compate(), are integrated to complete the network mining. Furthermore, the pipeline function named network.2() and corBionetwork() are also added for the quick achievement of the network or bipartite network analysis as well as their in‐depth mining. The ggClusterNet is publicly available via GitHub (https://github.com/taowenmicro/ggClusterNet/) or Gitee (https://gitee.com/wentaomicro/ggClusterNet) for users' access. A complete description of the usages can be found on the manuscript's GitHub page (https://github.com/taowenmicro/ggClusterNet/wiki).


Highlights
• ggClusterNet is an R package for microbial networks.

INTRODUCTION
In the past two decades, the rapid development of highthroughput sequencing technology has contributed to progress in microbial community studies as well as related bioinformatics tools [1].Among the important and common analysis methods of microbiome, network analysis and network thinking [2] have been widely used by biologists, mathematicians, social scientists, and computational scientists to explore interactions between entities, be they individuals in a school [3], species in a food web [4], nodes on a computer network [5], proteins in metabolic pathways [6], and group comparisons in Venn network [7].Network analysis has been used to explore the mathematical, statistical, or structural properties of a set of items (nodes) and the connections between them (edges [8]).It is also widely applied to the exploration of the co-occurrence patterns between microbial taxa within complex communities.For example, Ma et al. highlighted the interconnection patterns across microbiomes in various environments, and emphasized the importance of the co-occurrence feature of microbiomes with the network analysis [9]; Yuan et al. found that climate changes enhanced the complexity and stability of microbial networks [10].
Tools for network analysis of microbiome included web tool MENA [11] (MENAP), R packages (WGCNA [12], igraph [13], ggraph [14], SpiecEasi [15], interactive software (Cytoscape [16] and Gephi [17]), python packages (Net-workX [18] and SparCC [19]), and so forth.Many tools can be employed in the construction of the networks, for example, MENA was specifically designed for microbiome data and was easy to implement and robust against noise based on Random Matrix Theory (RMT) method; WGCNA was used to construct a scale-free topology weighted gene network based on a soft thresholding power; SpiecEasi could combine data transformations developed for compositional data analysis with a graphical model inference framework and accompanied by a set of computational tools to generate operational taxonomic units (OTUs) count data from a set of diverse underlying network topologies.Some tools, such as Cytoscape, Gephi, and R packages (igraph, ggraph, etc.) integrated the function of network visualization.Cytoscape could not only provide a wide range of powerful visualization schemes but also allow users to develop new features with many plugins; Gephi could easily complete a quite aesthetic visualization of the network with few operation steps.Visualization tools of R packages (igraph, ggraph) could also display the network in the command line with less time, reproducibly.However, these tools were incapable to complete the job of network visualization in all aspects easily, such as providing multiple visualization layouts, rapid calculation of results, and repeating easily.For example, Cytoscape had tedious steps and was not easily reproducible; the layouts in igraph and ggraph were not aesthetically appealing enough.
Currently, more researchers have focused on the interactions among microbial network modules, to explore their functions.However, these tools suffered from large difficulties associated with too many connections within a variety of microbial species in the network and the lack of suitable visualization schemes and software.To address the rise in demands for network analysis and visualization, we have developed the R package ggClusterNet via R Language [20].It provides a fast, automated, and easy-to-use pipeline for network analysis with multiple powerful visualizations (Figure 1).The ggClusterNet has the following characteristics: (1) The network analysis pipeline can complete quickly; (2) the network analysis pipeline can provide multiple exquisite visualization layouts; and (3) the network analysis pipeline can be completed with a few codes and is reproducible.It is expected that our tool could promote studies in the fields of microbiology and receive more collaborations with individuals and institutions for maintenance and development.

METHODS
The ggClusterNet is a package and network pipeline, which was developed under R language environment.During the development of this package, the functions cor() and Pvalue() (in WGCNA packages), sparccboot() (in SpiecEasi packages) and corr.test()(in psych packages) were used for calculation of correlations with the references of layout algorithm in ggraph and sna.These functions (average.path.length(),edge.connectivity(),no.clusters(), centralization.closeness(),erdos.renyi.game(),etc.) in igraph were used for deep mining networks.All the scripts were deposited in GitHub https://github.com/taowenmicro/ggClusterNet/.This package can be installed by devtools::install_github("taowenmicro/ggClusterNet") command in R.
To facilitate the use of multiple visualization layouts and the pipeline of network analysis, we developed a freely available R package, named ggClusterNet by connecting network visualization and deep mining (Figure 1).Ten network visualization layout algorithms were encapsulated in ggClusterNet as functions: randomClusterG, PolygonClus-terG, PolygonRrClusterG, ArtifCluster, randSNEClusterG, PolygonModsquareG, PolyRdmNotdCirG, model_Gephi.2,model_igraph, and model_maptree, respectively.These functions can be invoked individually.All these layout algorithms require correlation matrices as input, thus ggClusterNet provides the functions of corMicro() and corBigMicro(), which could compute multiple correlation matrices, including "spearman", "pearson," "kendall," and "sparcc," to satisfy the layouts needs.
To make the ggClusterNet available for users, we have made our R packages publicly available via GitHub (https://github.com/taowenmicro/ggClusterNet/)and Gitee (https://gitee.com/wentaomicro/ggClusterNet).More information, including a user guide, example script, and an extensive wiki, can be found on GitHub.A complete description of the data and usages can be found on the manuscript's GitHub page (https://github.com/taowenmicro/ggClusterNet/wiki).

Workflow in ggClusterNet
In the ggClusterNet, corMciro() (correlation matrices calculation for microbiome networks) or corBiostripe() [21] (correlation matrices calculation for bipartite networks) were used for calculating the correlation matrices (Figure 1).More than 10 layout algorithms were designed to calculate the layout of the visualization and plotted with ggplot2.The net_properties(), node_properties(), and ZiPiPlot() were integrated to calculate network properties, calculate node properties, and the role of a node according to modules, respectively.The random_Net_compate() was implanted to generate random networks following the null model and compared network properties within the experiment network (Figure 1).These functions were all included in the network.2()(pipeline for microbiome network) or corBionetwork() (pipeline for bipartite network) (Figure 2).Together, ggClusterNet can complete whole microbiome and bipartite network analysis from correlations calculation, network visualization, network properties calculation, and node properties and construction of the random networks and comparation.

Network layout
For better visualizing the microbial networks and highlighting the modules, we developed 10 layouts algorithms for visualization (Figure 3).These functions of algorithms are presented below: randomClusterG: Nodes of a module (group) were all arranged into one ring.Multiple modules were plotted as multiple same radii circles.Then, a function was designed, which could randomly arrange the rings in the plot panel.
ArtifCluster: Nodes of a module (group) were all arranged into one ring.Multiple modules were plotted as multiple same radii circles.Then, those circles were arranged manually by setting the coordinate values.
randSNEClusterG: Nodes of a module (group) were all arranged by multiple layouts in the sna package [22], respectively.Multiple modules were arranged randomly in the plot panel.
PolygonModsquareG: Nodes of a module (group) were all arranged into one ring.Multiple modules were plotted as multiple different radii circles (The higher number of nodes, the larger size of the radii).
Then, those circles were arranged into one or more rows manually.model_maptree: The modularity analysis was first conducted for the network and nodes grouped by network modularity and then were used for the calculation of coordinate.The relative position of the nodes was calculated based on the algorithm developed by Weixin Wang et al., which tried to find the densest packing of circles as they were added, one by one [23].
model_igraph: All nodes were placed on the plane using the force-directed layout algorithm by Fruchterman and Reingold.Nodes with a high degree calculated with the R package igraph tended to be grouped, while nodes with a low degree were distributed within the surrounding network.
model_Gephi.2:All nodes were plotted as a circle and calculated the coordinates.Then, the coordinates value of each node was used to formulate clusters and reassigned to nodes.
PolyRdmNotdCirG: Nodes were randomly distributed in multiple different radii circles (the higher number of nodes, the larger size of the radii) according PolygonRrClusterG: Nodes of a module (group) were arranged into one ring.Multiple modules were plotted as multiple different radii circles (the higher number of nodes, the larger size of the radii).Then, those circles were regularly arranged to vertices of the polygon with the center of the coordinate axis.
PolygonClusterG: Nodes of a module (group) were all arranged into one ring.Multiple modules were plotted as multiple same radii circles.Then, those circles were arranged regularly to vertices of the polygon (the number of edges is equal to the number of modules) with the center of the origin of the coordinate axis.

DISCUSSION
Microbial ecology researchers have gradually favored network analysis [24], and thus, many powerful tools were developed, such as Cytoscape, Gephi, igraph, and so forth.In comparison with the previous network analysis tools, ggClusterNet showed significant advantages.Cytoscape and Gephi were popular for visualization due to their interactive graphical user interface and attractive visualization results.Cytoscape offered powerful functions for network analysis, though many parameters need adjustment for the module's visualization in the network.Gephi could display the modules in the network using default parameters, while less could be done of the deeply mining network.R packages (e.g., igraph, ggraph, sna) supplied many topological properties during network analysis, but their visualizations were less aesthetical than Gephi and Cytoscape generated.The ggClusterNet integrated the advantages of igraph, ggraph, and sna, and it provided the multiple layout algorithms, and its pipeline (network.2()and network()) can show the modules within networks and deep mining of the network.
The future work will continue to develop ggCluster-Net.To enhance the function of microbiome network analysis, it will be added in the pipeline with the stability of network dynamics and deeply mining the module's function to maintain network stability.Then, shiny will be used to construct the user-friendly interface, and it will be more convenient for more investigators to explore the networks in detail.In addition, further mining for bipartite network analysis and the development of more F I G U R E 3 Ten layout algorithms of network visualization in ggClusterNet layout algorithms suitable for the bipartite network will be conducted in our future work.

F I G U R E 1
The workflow and tools exhibit the steps of network construction and analysis in ggClusterNet R PACKAGE GGCLUSTERNET FOR MICROBIOME NETWORKS | 3 of 7

F I G U R E 2
The workflow of the pipeline function names network.2()and corBionetwork() included input data type and output results to module information.Then, those modules were arranged regularly to vertices of the polygon (The number of edges is equal to the number of modules) with the of the origin of the coordinate axis.